A High Quality Partial Parser for Annotating German Text Corpora

نویسنده

  • Stefan Klatt
چکیده

In this paper, a two-stage partial parser for untagged German sentences is presented. In the first stage, the sentence is segmented into better parsable units according to the Topological Field Model. In the second stage, minimal phrases of NPs, DPs and PPs as well as nominal multiword units are identified in each of the recognized fields. In this paper, we discuss the results of the second stage. We evaluated 500 parsed sentences of a newspaper corpus. The achieved recall and precision rates are better than the ones of comparable systems as reported in literature so far.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Annotating Syllable Corpora with Linguistic Data Categories in XML

The usefulness of high quality annotated corpora as a development aid in computational linguistic applications is now well understood. Therefore it is necessary to have systematic, easily understandable and effective means for annotating corpora at many levels of linguistic description using. This paper presents a three step methodology for annotating speech corpora using linguistic data catego...

متن کامل

Collaboratively Annotating Multilingual Parallel Corpora in the Biomedical Domain―some MANTRAs

The coverage of multilingual biomedical resources is high for the English language, yet sparse for non-English languages—an observation which holds for seemingly well-resourced, yet still dramatically low-resourced ones such as Spanish, French or German but even more so for really under-resourced ones such as Dutch. We here present experimental results for automatically annotating parallel corp...

متن کامل

APOLN: A Partial Parser Of Unrestricted Text

In this paper, we present APOLN (Analizador Parcial de Oraciones en Lenguaje Natural): a partial parser of unrestricted natural language sentences based on finite-state techniques. Partial parsing has been used in several applications: syntactic parsing of unrestricted texts, data extraction systems, machine translation, solving the attachment ambiguity, speech recognition systems, text summari...

متن کامل

Improving Dependency Parsing with Interlinear Glossed Text and Syntactic Projection

Producing annotated corpora for resource-poor languages can be prohibitively expensive, while obtaining parallel, unannotated corpora may be more easily achieved. We propose a method of augmenting a discriminative dependency parser using syntactic projection information. This modification will allow the parser to take advantage of unannotated parallel corpora where high-quality automatic annota...

متن کامل

Automatic Selection of High Quality Parses Created By a Fully Unsupervised Parser

The average results obtained by unsupervised statistical parsers have greatly improved in the last few years, but on many specific sentences they are of rather low quality. The output of such parsers is becoming valuable for various applications, and it is radically less expensive to create than manually annotated training data. Hence, automatic selection of high quality parses created by unsup...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2004